Author: Vivek Roy
Maximum possible score = 100 points
The overall goals of this programming assignment are to explore essential image processing methods and to blend two images using Gaussian and Laplacian pyramids. You should note any connection between the implementation of these methods and the theory presented in the lectures.
This assignment was modeled, in part, on the approach covered in an Image Blending assignment offered in an Introduction to Visual Computing course by Dr. Fernando Flores-Mangas.
#import libraries for this assignment
from skimage.io import imread
import matplotlib.pyplot as plt
import numpy as np
from skimage.filters import gaussian
from skimage.transform import resize
We start this assignment with loading an image file and visualizing the values that constitute it. In the course, you learnt about how an image is a matrix of values. We will validate that claim on some image files. We will also create an image from a matrix of values and save that to disk. In the process, you will learn about some common Python functions used for loading, cropping, visualizing, and writing images.
from skimage.io import imread
import os
Use the imread function to load the following three images from the img subfolder:
img/front.jpgimg/center.jpgimg/back.jpg### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
front = None
center = None
back = None
### BEGIN SOLUTION
front = imread(os.path.join("img", "front.jpg"))
center = imread(os.path.join("img", "center.jpg"))
back = imread(os.path.join("img", "back.jpg"))
### END SOLUTION
### BEGIN HIDDEN TESTS
assert len(front.shape) == 3 and len(center.shape) == 3 and len(back.shape) == 3, "The images are color images. Do not load them as grayscale."
assert front.shape[-1] == 3 and center.shape[-1] == 3 and back.shape[-1] == 3, "The images are color images. Do not load them as grayscale."
assert front.dtype == "uint8" and center.dtype == "uint8" and back.dtype == "uint8", "Load the images as uint8. This is the default. Do not typecast."
from skimage.util import compare_images
_front = imread(r'.\img\front.jpg')
_center = imread(r'.\img\center.jpg')
_back = imread(r'.\img\back.jpg ')
tol = 1e-3
assert compare_images(_front, front).sum() < tol, "Incorrect image loading"
assert compare_images(_center, center).sum() < tol, "Incorrect image loading"
assert compare_images(_back, back).sum() < tol, "Incorrect image loading"
print("Correct!")
### END HIDDEN TESTS
Correct!
Now that we have loaded 3 JPG files into variables, let us dissect these objects. The imread function we used loads the image into a numpy ndarray which is essentially a data structure representing a multidimensional matrix. If you print the shape of the above three images using the .shape member of ndarray instances as front.shape, you will see a shape of (2000, 3000, 3).
The value 2000 represents the height of the image, 3000 represents the width of the image while the 3 represents the number of channels — the Red, Green and Blue (RGB) channels of a colored image.
A numpy ndarray can be indexed in a lot of ways (See Numpy User Guide for details). We will index into them to crop a small portion of an image.
front.shape
(2000, 3000, 3)
Use array indexing to crop into the center image with x (rows) from index 1300 to 1400 and y (columns) from index 800 to 900. Make sure to have all color channels. Store this into a variable named crop.
As a sanity check, you can test the shape of the cropped image with crop.shape.
### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
crop = None
### BEGIN SOLUTION
crop = center[800:900, 1300:1400, :]
### END SOLUTION
### BEGIN HIDDEN TESTS
assert len(crop.shape) == 3, "Make sure to have all color channels."
assert crop.shape[-1] == 3, "Make sure to have all color channels."
assert crop.dtype == "uint8", "Do not typecast the crop in any way."
assert crop.shape == (100,100,3), "Wrong crop. Check crop.shape for sanity check."
from skimage.util import compare_images
_crop = center[800:900, 1300:1400, :]
tol = 1e-3
assert compare_images(_crop, crop).sum() < tol, "Incorrect image cropping"
print("Correct!")
### END HIDDEN TESTS
Correct!
import matplotlib.pyplot as plt
Use plt.imshow(image) to plot/show the image. Plot the crop image we generated in the previous step. You should see the tip of a pyramidal object with blue and red sides.
Note: If not in a notebook environment, you might have to use plt.show() function before you see anything.
plt.imshow(crop);
You can now see any image inside of Python. Go ahead and see the three images (front, center and back) we loaded before. As you examine these images, be sure to look for which regions of each image are in sharp focus and which regions are blurry.
plt.figure(figsize=(18,12))
plt.imshow(front);
plt.figure(figsize=(18,12))
plt.imshow(center);
plt.figure(figsize=(18,12))
plt.imshow(back);
You can also show images side-by-side using the subplots call along with titles. Again, notice which regions are sharp and which are blurry.
#set font size parameter
plt.rcParams.update({'font.size': 30})
fig, (ax1, ax2, ax3) = plt.subplots(1,3,sharey=True, figsize=(30,12))
ax1.imshow(back)
ax1.title.set_text("Back")
ax2.imshow(center)
ax2.title.set_text("Center")
ax3.imshow(front)
ax3.title.set_text("Front")
plt.tight_layout();
Can you now tell why the files have been named the way they have been?
That is right, back.jpg has the camera focusing at the back objects, center.jpg is focused on the central objects while front.jpg is focused on the front mug.
We will now create an image from inside Python. It will be a very basic black and white image, probably only a few clicks in MS Paint, but this basic image will be useful for the rest of the assignment.
A binary image is an image made up of only 0s and 1s, where zero represents full black and 1 represents full color — a 0 in the red channel will mean no red and a 1 in the red channel will mean full red.
We will create a binary image which is black and white, i.e. red, blue and green values are equal for every pixel location and since it is a binary image that has to be either 0 or 1.
Create a binary image with the same shape as the other 3 images (all 3 images have the same shape) with the first 1339 columns as black (all channels have zeros) and the rest white (all channels have ones). Store it in a variable names mask.
See numpy.zeros or numpy.ones along with your previous knowledge on indexing numpy ndarray.
import numpy as np
### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
mask = None
### BEGIN SOLUTION
mask = np.zeros(front.shape)
mask[:, 1340:, :] = 1
### END SOLUTION
### BEGIN HIDDEN TESTS
assert len(mask.shape) == 3, "Make sure to have color channels."
assert mask.shape[-1] == 3, "Make sure to have all 3 color channels."
assert mask.dtype == float, "Do not typecast the mask in any way."
assert mask.shape == front.shape, "Wrong shape."
assert np.all(mask[:, :1340] == 0), "Wrong mask."
assert np.all(mask[:, 1340:] == 1), "Wrong mask."
print("Correct!")
### END HIDDEN TESTS
Correct!
plt.imshow(mask);
We will first write a function for computing the Gaussian pyramid of an image. We will use the gaussian function (API Reference: gaussian) from skimage.filters for applying a Gaussian blur to an image.
Note: Instead of taking a kernel or the size of a kernel, this function takes a value of sigma and a truncated value and computes a kernel internally. A sigma of 1.0 and truncated value of 3.5 will give a 7$\times$7 Gaussian kernel.
from skimage.filters import gaussian
Complete the following function to compute the Gaussian pyramid of an image. Here is an explanation for the arguments to the function —
image: The image whose gaussian pyramid we want to makesigma: The sigma to pass to the gaussian function from skimage to blur the imagetruncate: The truncate value to pass to the gaussian function from skimagesmallestDimension: Both the height and width of the smallest image of the gaussian pyramid should be smaller than this value.The function should return a list of images with the first image being the largest image and the last image being the smallest of the Gaussian pyramid.
### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
def gaussianPyramid(image, sigma=10, truncate=4, smallestDimension=100):
pyramid = []
return pyramid
### BEGIN SOLUTION
def gaussianPyramid(image, sigma=10, truncate=4, smallestDimension=100):
if image.dtype == int:
image = image.astype(float) / 255
pyramid = [image]
while image.shape[0] > smallestDimension or image.shape[1] > smallestDimension:
image = gaussian(image, sigma=sigma, truncate=truncate, multichannel=True)[::2, ::2]
pyramid.append(image)
return pyramid
### END SOLUTION
### BEGIN HIDDEN TESTS
def _gaussianPyramid(image, sigma=10, truncate=4, smallestDimension=100):
if image.dtype == int:
image = image.astype(float) / 255
pyramid = [image]
while image.shape[0] > smallestDimension or image.shape[1] > smallestDimension:
image = gaussian(image, sigma=sigma, truncate=truncate, multichannel=True)[::2, ::2]
pyramid.append(image)
return pyramid
maskPyramid = gaussianPyramid(mask, sigma=10, truncate=4, smallestDimension=100)
_maskPyramid = _gaussianPyramid(mask, sigma=10, truncate=4, smallestDimension=100)
assert len(maskPyramid) == len(_maskPyramid), "Number of images in pyramid not correct. Make sure you have the original full size image in the pyramid and the smallest image has both height and width smaller than smallestDimension"
assert [mp.shape for mp in maskPyramid] == [mp.shape for mp in _maskPyramid], \
"The dimensions of the images in the pyramid are incorrect. Make sure the first image in the pyramid is the largest image same as the original full size image and the last one is the smallest with both height and width smaller than smallestDimension."
tol = 1e-3
assert np.all([compare_images(_mp, mp).sum() < tol for mp, _mp in zip(maskPyramid, _maskPyramid)]), "Incorrect pyramid generated."
print("Correct!")
### END HIDDEN TESTS
Correct!
Now we will write a function to generate the Laplacian pyramid of an image. We will continue to use the gaussian function from skimage as we did with the gaussian pyramid. The only difference here will be that instead of just returning the pyramid this time we will return a tuple of (smallestImage, pyramid) where smallestImage is the smallest image of the Laplacian pyramid and pyramid is a pyramid of residuals at different resolutions starting from the largest to the smallest.
Complete the following function to compute the Laplacian pyramid of an image.
### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
def laplacianPyramid(image, sigma=10, truncate=4, smallestDimension=100):
pyramid = []
smallestImage = None
return smallestImage, pyramid
### BEGIN SOLUTION
def laplacian(image, sigma, truncate):
filtered = gaussian(image, sigma=sigma, truncate=truncate, multichannel=True)
return filtered[::2,::2,:], image - filtered
def laplacianPyramid(image, sigma=10, truncate=4, smallestDimension=100):
image = image.astype(float) / 255
pyramid = []
while image.shape[0] > smallestDimension or image.shape[1] > smallestDimension:
image, residual = laplacian(image, sigma, truncate)
pyramid.append(residual)
return image, pyramid
### END SOLUTION
### BEGIN HIDDEN TESTS
def _laplacian(image, sigma, truncate):
filtered = gaussian(image, sigma=sigma, truncate=truncate, multichannel=True)
return filtered[::2,::2,:], image - filtered
def _laplacianPyramid(image, sigma=10, truncate=4, smallestDimension=100):
image = image.astype(float) / 255
pyramid = []
while image.shape[0] > smallestDimension or image.shape[1] > smallestDimension:
image, residual = _laplacian(image, sigma, truncate)
pyramid.append(residual)
return image, pyramid
frontSmallestImage, frontPyramid = laplacianPyramid(front, sigma=10, truncate=4, smallestDimension=100)
_frontSmallestImage, _frontPyramid = _laplacianPyramid(front, sigma=10, truncate=4, smallestDimension=100)
assert frontSmallestImage.shape == _frontSmallestImage.shape, "smallestImage has the wrong dimensions. Make sure it has both height and width smaller than smallestDimension"
assert len(frontPyramid) == len(_frontPyramid), "Number of images in pyramid not correct. Make sure you have the residuals of the original full size image in the pyramid and the smallest image has both height and width smaller than smallestDimension"
assert [fp.shape for fp in frontPyramid] == [fp.shape for fp in _frontPyramid], \
"The dimensions of the images in the pyramid are incorrect. Make sure the first image in the pyramid is the residual of the largest image with the same dimensions as the original full size image and the last one is the smallest with both height and width smaller than smallestDimension."
tol = 1e-3
assert np.all([compare_images(_fp, fp).sum() < tol for fp, _fp in zip(frontPyramid, _frontPyramid)]), "Incorrect pyramid generated."
assert compare_images(frontSmallestImage, _frontSmallestImage).sum() < tol, "Incorrect smallest image. The first value of the return type should be the smallest image and not it's residual."
print("Correct!")
### END HIDDEN TESTS
Correct!
Let us now see how we can get back the original image from the small image and the Laplacian pyramid we computed above.
Write a function that takes two arguments:
From that function, return an image obtained from the pyramid with the same dimensions as the first element of the Laplacian pyramid.
In order to scale an image fron dimension $d$ to $D$ where $D > d$ use the resize function (API reference: resize). Use the default values for all the named arguments.
from skimage.transform import resize
### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
def pyramidToImage(small, pyramid):
image = None
return image
### BEGIN SOLUTION
def pyramidToImage(small, pyramid):
image = small
for residual in reversed(pyramid):
image = resize(image, residual.shape)
image += residual
return image
### END SOLUTION
### BEGIN HIDDEN TESTS
def _pyramidToImage(small, pyramid):
image = small
for residual in reversed(pyramid):
image = resize(image, residual.shape)
image += residual
return image
image = pyramidToImage(_frontSmallestImage, _frontPyramid)
_image = _pyramidToImage(_frontSmallestImage, _frontPyramid)
assert len(image.shape) == 3, "The returned image should be an RGB image. Your image does not seem to have the right dimensions. Use image.shape for sanity check."
assert image.shape[-1] == 3, "The returned image should be an RGB image with the channels as the last dimension of the image. Use image.shape for sanity check."
assert image.shape == _frontPyramid[0].shape, "The returned image should have the same size as the first element of the Laplacian pyramid."
tol = 1e-3
assert compare_images(image, _image).sum() < tol, "Incorrect reconstructed image. Make sure to use resize from skimage.transform with the default arguments unchanged. The operation is just the inverse of what you did to generate a Laplacian pyramid, so do not do any kind of non-linear operations or type casting."
print("Correct!")
### END HIDDEN TESTS
Correct!
Let us see how we can use the gaussian and Laplacian pyramids to do image blending. Let us merge back.jpg image with front.jpg image to generate an image with both the front and back objects in focus but not the middle objects.
First, let us do it the naive way by having 50% transparency for both the images.
naiveMerged = (0.5 * front + 0.5 * back).astype(np.uint8)
plt.figure(figsize=(18,12))
plt.imshow(naiveMerged);
Do you see the whole image to have a kind of a blurriness you would typically see in a dreamy filter? While that might be an artistic effect you are looking for in some other scenario, that is not something we want when blending two images. Also, if the two images were very different, e.g. blending the image of an apple with an image of an orange, the resulting image would not have blurriness but ghosting.
Can we do better with a Laplacian pyramid?
Let's start with generating a Gaussian pyramid of the mask.
maskPyramid = gaussianPyramid(mask, sigma=30)
And then the Laplacian pyramid of both the front and back images
frontSmall, frontPyramid = laplacianPyramid(front)
backSmall, backPyramid = laplacianPyramid(back)
Now let's merge the front and back Laplacian pyramids while masking with the appropriate mask at each level.
Complete the code below. This should be very similar to recreating an image from its Laplacian pyramid except that at each stage we will also mask and combine with the other pyramid similar to what we did for the naive merging. Store it in a variable named smartMerge.
One easy way of doing this would be to merge the two Laplacian pyramids together using the Gaussian pyramid to form a new Laplacian pyramid and then use the pyramidToImage function defined above to get the final image.
### GRADED
### YOUR ANSWER BELOW
### YOUR SOLUTION HERE
smartMerge = None
### BEGIN SOLUTION
maskSmall = maskPyramid[-1]
mergedSmall = maskSmall * frontSmall + (1 - maskSmall) * backSmall
mergedPyramid = [maskP * frontP + (1 - maskP) * backP for frontP, backP, maskP in zip(frontPyramid, backPyramid, maskPyramid)]
smartMerge = pyramidToImage(mergedSmall, mergedPyramid)
### END SOLUTION
### BEGIN HIDDEN TESTS
assert len(image.shape) == 3, "The returned image should be an RGB image. Your image does not seem to have the right dimensions. Use image.shape for sanity check."
assert image.shape[-1] == 3, "The returned image should be an RGB image with the channels as the last dimension of the image. Use image.shape for sanity check."
_maskPyramid = _gaussianPyramid(mask, sigma=30)
_frontSmall, _frontPyramid = _laplacianPyramid(front)
_backSmall, _backPyramid = _laplacianPyramid(back)
_maskSmall = _maskPyramid[-1]
_mergedSmall = _maskSmall * _frontSmall + (1 - _maskSmall) * _backSmall
_mergedPyramid = [maskP * frontP + (1 - maskP) * backP for frontP, backP, maskP in zip(_frontPyramid, _backPyramid, _maskPyramid)]
_smartMerge = _pyramidToImage(_mergedSmall, _mergedPyramid)
tol = 1e-3
assert compare_images(smartMerge, _smartMerge).sum() < tol, "Incorrect merging."
print("Correct!")
### END HIDDEN TESTS
Correct!
plt.figure(figsize=(18,12))
plt.imshow(smartMerge);
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Now that you've completed the assignment, think about each of the methods above and their impact on the resulting images. What was visually different between each of the blended images? How were those results generated and how did pyramids affect the results? Reflect on these guiding questions before considering the challenge below.
A common example of image blending is blending the image of an apple with that of an orange. So here is a task for you —
fruit_01.png, fruit_02.png) of two pieces of fruit of similar size and dimensions. Try to make the images as similar as possible except the fruit in the foreground. That is, the only easily discernable difference in the photo will be the fruit itself.